Subgraph Mining
نویسنده
چکیده
The amount of available data is increasing very fast. With this data, the desire for data mining is also growing. More and larger databases have to be searched to find interesting (and frequent) elements and connections between them. Most often the data of interest is very complex. It is common to model complex data with the help of graphs consisting of nodes and edges that are often labeled to store additional information. Having a graph database, the main goal is to find connections and similarities between its graphs. Based on these connections and similarities, the graphs can be categorized, clustered or changed according to the application area. Regularly occurring patterns in the form of subgraphs —called fragments in this context—that appear at least in a certain percentage of graphs, are a common method to analyze graph databases. The actual occurrence of a fragment in a database graph is called embedding. Finding the fragments and their embeddings is the goal of subgraph mining described in detail in this chapter. The first published graph mining algorithm, called Subdue, appeared in the mid-1990s and is still used in different application areas and was extended in several ways. (Cook & Holder, 2000). Subdue is based on a heuristic search and does not find all possible fragments and embeddings. It took a few more years before more and faster approaches appeared. In (Helma, Kramer, & de Raedt, 2002) graph databases are mined for simple paths, for a lot of other applications only trees are of interest (Rückert & Kramer, 2004). Also Inductive Logic Programming (Finn et al., 1998) was applied in this area. At the beginning of the new millennium finally more and more and every time faster approaches for general mining of graph databases were developed that were able to find all possible fragments. (Borgelt & Berthold, 2002; Yan & Han, 2002; Kuramochi & Karypis, 2001; Nijssen & Kok, 2004). Several different application areas for graph mining are researched. The most common area is mining molecular databases where the molecules are displayed by their two-dimensional structure. When analyzing molecules it is interesting to find patterns that might explain why a certain set of molecules is useful as a drug against certain diseases (Borgelt & Berthold, 2002). Similar problems occur for protein databases. Here graph data mining can be used to find structural patterns in the primary, secondary and tertiary structure of protein categories (Cook & Holder, 2000). Another application area are web searches (Cook, Manocha, & Holder, 2003). Existing search engines use linear feature matches. Using graphs as underlying data structure, nodes represent pages, documents or document keywords and edges represent links between them. Posing a query as a graph means a smaller graph has to be embedded in the larger one. The graph modeling the data structure can be mined to find similar clusters. Quite new is the application of subgraph mining in optimizing code for embedded devices. With the help of so-called procedural abstraction, the size of pre-compiled binaries can be reduced which is often crucial because of the limited storage capacities of embedded systems. There, subgraph mining helps identifying common structures in the program’s control flow graph which can then be combined (“abstracted”) into a single procedure (Dreweke et.al., 2007).
منابع مشابه
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
We study the frequent connected induced subgraph mining problem, i.e., the problem of listing all connected graphs that are induced subgraph isomorphic to a given number of transaction graphs. We first show that this problem cannot be solved for arbitrary transaction graphs in output polynomial time (if P 6= NP) and then prove that for graphs of bounded tree-width, frequent connected induced su...
متن کاملMining Density Contrast Subgraphs
Dense subgraph discovery is a key primitive in many graph mining applications, such as detecting communities in social networks and mining gene correlation from biological data. Most studies on dense subgraph mining only deal with one graph. However, in many applications, we have more than one graph describing relations among a same group of entities. In this paper, given two graphs sharing the...
متن کاملFrequent subgraph mining algorithms on weighted graphs
This thesis describes research work undertaken in the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of weighted frequent subgraph mining can offer in the context of the graph model based classification. Weighted subgraphs are graphs where some of the vertexes/edges are considered to be more significant t...
متن کاملMinimal Vertex Unique Labelled Subgraph Mining
This paper introduces the concept of Vertex Unique Labelled Subgraph Mining (VULSM), a specialised form of subgraph mining. A VULS is a subgraph defined by a set of edge labels that has a unique vertex labelling associated with it. A minimal VULS is then a VULS which is not a supergraph of any other VULS. The application considered in this paper, for evaluation purposes, is error prediction wit...
متن کاملMining Frequent Graph Sequence Patterns Induced by Vertices
The mining of a complete set of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences (dynamic graphs or evolving graphs). In this paper, we define a novel class of subgraph subsequence called an “induced subgraph subsequence” to enable efficient mining of a complete set of freq...
متن کاملA Comparative Study of F Requent Subgraph M Ining a Lgorithms
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. Graph is a natural data structure used for modeling complex objects. Frequent subgraph mining is another active research topic in data mining . A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009